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Abstract 

We present a method for grouping the synonyms 
of a lemma according to its dictionary senses. 
The senses are defined by a large machine read- 
able dictionary for French, the TLFl (Tresor 
de la langue frangaise informatise) and the syn- 
onyms are given by 5 synonym dictionaries (also 
for French). To evaluate the proposed method, 
we manually constructed a gold standard where 
for each (word, definition) pair and given the 
set of synonyms defined for that word by the 
5 synonym dictionaries, 4 lexicographers speci- 
fied the set of synonyms they judge adequate. 
While inter-annotator agreement ranges on that 
task from 67% to at best 88% depending on the 
annotator pair and on the synonym dictionary 
being considered, the automatic procedure we 
propose scores a precision of 67% and a recall 
of 71%. The proposed method is compared with 
related work namely, word sense disambiguation, 
synonym lexicon acquisition and WordNet con- 
struction. 
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1 Introduction 

Synonymic resources for French are still limited in 
scope, quality and/or availability. Thus the French 
WordNet (Frewn) created within the Euro WordNet 
project [IS] has limited scope (3 777 verbs and 14 618 
nouns vs. 7 384 verbs and 42 849 nouns in the morpho- 
logical lexicon for French Morphalou) and has not been 
widely used mainly due to licensing issues. The alter- 
native open-source WordNet for French called Wolf 
(WordNet Libre du Francais, [23]) remedies the first 
shortcoming (restrictive licensing) and aims to achieve 
a wider coverage by automating the WordNet con- 
struction process using an extend approach which in 
essence, translates the synsets from Princeton Word- 
Net (PWN) into French. However, compared to Mor- 
phalou, Wolf is still incomplete (979 verbs and 34 
827 nouns). Finally, the synonym lexicon DicoSyn [Hi] 



*Thc research presented in this paper was partially supported 
by the TALC theme of the CPER "Modclisation, Information 
et Systemes Numeriques" funded by the Region Lorraine. We 
also gratefully acknowledge the ATILF for letting us access their 
synonym database and the TLFi. 



is restricted to assigning sets of synonyms to lemmas 
thereby lacking both categorial information and defi- 
nitions. 

In this paper, we present a method for grouping syn- 
onyms by senses and evaluate it on the synonyms given 
by 5 synonym dictionaries included in the Atilf syn- 
onym database. The long term aim is to apply this 
method to these synonym dictionaries so as to build a 
uniform synonymic resource for French in which each 
lemma is assigned a part of speech, a set of (TLFi) 
definitions and for each given definition, a set of syn- 
onyms. The resulting resource should complement Di- 
coSyn and Wolf. Contrary to DicoSyn, it will in- 
clude categorial information and associate groups of 
synonyms with definitions. It will furthermore com- 
plement Wolf by providing an alternative synonymic 
resource which, being built on handbuilt high quality 
resources, should differ from Wolf both in coverage 
and in granularity. 

The paper is structured as follows. Section [2J 
presents the data we are working from, namely a set 
of synonym dictionaries for French and the TLFi, 
the largest machine readable dictionary available for 
French. Section [3] describes the basic algorithm used 
to assign a verb synonym to a given definition. Sec- 
tion [3] presents the experiments we did to assess the 
impact of the similarity measures used and of a lin- 
guistic preprocessing on the definitions. Section \E\ dis- 
cusses related work. Section El concludes and gives 
pointers for further research. 

2 The source data 

We have at our disposal a general purpose machine 
readable dictionary for French, the Tresor de la Langue 
Frangaise informatise (TLFi, [HIE]) an( i 5 synonym 
dictionaries namely, Dictionnaire des synonymes de la 
langue frangaise [2J, Dictionnaire des synonymes [5], 
Nouveau dictionnaire des synonymes |12) . Diction- 
naire alphabetique et analogique de la langue frangaise 
[23], Grand Larousse de la Langue Frangaise |17j . 

One driving motivation behind our method is the 
question of how to merge these 5 synonym lexicons in a 
meaningful way. Indeed although one of them (namely, 
|23| ) covers most of the verbs present in the five syn- 
onym lexicons (5 027 verbs out of 5 736) , a merge of the 
lexicons would permit an increased "synonymic cover- 
age" (11 synonyms in average per verb with the 5 lexi- 
cons against 6 per verb using only |23j). To merge the 



five lexicons, we plan to apply the method presented 
here to each of the synonyms assigned to a word by 
the 5 synonym lexicons. In this way, we aim to ob- 
tain a merged lexicon in which each word is associated 
with a part of speech, a set of TLFi definitions and 
for each definition, the set of synonyms associated to 
this definition. 

For our experiment, we worked on a restricted 
dataset. First, we handled only verbs. Since they are 
in average more polysemou|3 than other categories, 
they nevertheless provide an interesting benchmark. 
Second, we based our evaluation on a single synonym 
dictionary, namely |23j . As mentioned above, this is 
the largest of the five lexicons (cf. Fig. [5]). Moreover, 
it is unlikely that the quality of the results obtained 
vary greatly when considering more synonyms since, 
as we shall see in Section [3l the synonym-to-definition 
mapping performed by our method is independent of 
the number of synonyms assigned to a given word. 

The TLFi is the largest machine readable dictionary 
available for French (54 280 entries, 92 997 lemmas, 
271 166 definitions, 430 000 examples). It has a rich 
XML markup which supports a selective treatment of 
entry subfields. Moreover, the definitions have been 
part-of-speech tagged and lemmatised. 

For our experiment, we extracted from the TLFi all 
the verb entries and their associated definitions. Defi- 
nitions were extracted by selecting the XML elements 
identifying an entry definition and checking their con- 
tent. If a selected definitional clement contained either 
some text (i.e., a definition), a synonym or a domain 
specification, the XML element was taken to indeed 
identify a definition. Else, no definition was stored. In 
this way, XML elements that did not contain any defi- 
nitional information such as subdefinitions containing 
only examples, were not taken into account. 

For each selected definitional element, a definition 
index was then constructed by taking the open class 
lemmas associated with the definition and, if any, the 
synonyms and/or the domain information contained in 
the definitional clement. For instance, given the TLFi 
definitions for projeter (to project) listed in Fig.[TJ the 
indexes extracted will be as indicated below each def- 
inition. In (Juj), the index contains the open class lem- 
mas of the definition; in jbj the domain information 
is also included and in (jej), synonymic information is 
added. 

The synonym dictionaries. The table in Fig. [2] 
gives a quantitative summary of the data contained 
in the five available synonym dictionaries. Each entry 
in the synonym dictionaries is associated with one or 
more sets of synonyms, each set corresponding to a dif- 
ferent meaning of the entry. The synonym dictionar- 
ies however contain neither part of speech information 
nor definitions. An example entry of |23| is given in 
Figure [3] For the experiment, we extracted the verb 
entries (using a morphological lexicon) of these dictio- 
naries that were also present in the TLFi. Synonyms 



1 The average polysemy recorded by the Princeton WordNet 
for the various parts of speech is: 2.17 for verbs, 1.4 for ad- 
jectives, 1.25 for adverbs and 1.24 for nouns. 



a. Jeter loin en avant avec force. 

To throw far ahead and with strength 

( jeter, loin, avant, force ) 

b. CIN. AUDIOVISUEL. Passer dans un projecteur. 

CIN. AUDIOVISUAL. To show on a projector. 

( cinema, audiovisuel, passer, projecteur ) 

c. Eclaircir. Synon. jeter quelque lumiere 

To lighten. Synonym. To throw some light. 

( lighten, throw, light ) 

Fig. 1: Some definitions and extracted indexes for 
projeter (to project/ 



or entries that were present in the synonym dictionar- 
ies but not in the TLFi were discarded. 
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Fig. 2: Verbs from TLFi, also present in the synonym 
dictionaries. -Refl indicates the number of non reflex- 
ive verb entries (laver), +Refl the number of reflexive 
verb entries (se laver). 



Reference. To evaluate our results, we built a ref- 
erence sample as follows. First, we selected a sam- 
ple of French verbs using the combination of three 
features: genericity, polysemy and frequency. Each 
feature could have one of the three values "high" , 
"medium" and "low" thus yielding a sample of 27 
verbs. Genericity was assessed using the position of 
the verb in the French Euro WordNet (the higher the 
more generic). Polysemy was defined by the number 
of definitions assigned to the verb by the TLFi. Fre- 
quency was extracted from a frequency list built from 
10 years of Le Monde newspaper parsed with the Syn- 
tex parser [7]. 

For these 27 verbs, we extracted the correspond- 
ing definitions and synonyms from the TLFi and the 
synonym dictionaries respectively. To facilitate the 
assignment by the annotators of synonyms to defi- 
nitions, we manually reconstructed some of the defi- 
nitions from the information contained in the TLFi 
entries. Indeed a dictionary entry has a hierarchical 
structure (a definition can be the child of another defi- 
nition) which is often used by the lexicographer to omit 
information in definitions occurring lower down in the 
hierarchy. The assumption is that the missing infor- 
mation is inherited from the higher levels. To facilitate 
the assignment by the annotator of a given synonym 
to a given definition, we manually reconstructed the 
information that had been omitted on an inheritance 
assumption. Note though that this manual reconstruc- 
tion is only intended to facilitate the annotation task. 



It does not affect the evaluation since the numbering of 
the definitions within a given dictionary entry remains 
the same and what is being compared is solely the as- 
signment of synonyms to definition identifiers made by 
the system and that made by the annotators. 

Third, we asked four professional lexicographers to 
manually assign synonyms to definitions. The lexicog- 
raphers were given for each verb v in the sample, the 
set of (possibly reconstructed) definitions assigned by 
the TLFi to v and the set of synonyms associated to v 
by the synonym base. They then had to decide which 
dcfinition(s) the synonym should be associated with. 

We computed the agreement rate between pairs of 
annotators and all four annotators. No pair achieved 
a perfect agreement. The proportions of triples for 
which two annotators agree range from 87.07% (high- 
est) to 74.06% (lowest) and the agreement rate for 
four annotators was even lower, 63.37%. This indi- 
cates that matching synonyms with definitions is a 
difficult task even for humans. On the other hand, 
the reasonably high agreement rate suggests that the 
sample provides a reasonable basis for evaluation. Ac- 
cordingly we used the rating produced by the first an- 
notator of the pair with the highest agreement as a 
baseline for our system. 



3 The basic procedure 

Given a verb V, a synonym Syny of that verb and a 
set of definitions Dy = {d\ . . . d n } given for V by the 
TLFi, the task is to identify the definitions di G Dy 
of V for which Syny is a synonym of V. 



Mapping synonyms to definitions. To assign a 
synonym Syny to a definition di of V, we proceed 
as follows: First we compare the index of the merged 
definitions of Syny with the index of each definition 
di G Dy using a gloss-based similarity measure. Note 
that since the intended meaning of the synonym is not 
given, we do not attempt to identify it and use as the 
basis for comparison the union of the definitions given 
by the TLFi for each synonym. Next, the synonym 
Syny is mapped onto the definition that gets the high- 
est (non null) similarity score. 



Evaluation. We evaluated the results obtained with 
respect to the reference sample presented in the pre- 
vious section as follows. 

From the reference, we extracted the set of tuples 
( V, Syny, Defi) such that Syny, is a synonym of V 
which is associated with the definition Dcf^ of V. 

Recall is then the number of correct tuples produced 
by the system divided by the total number of tuples 
contained in the reference. Precision is the number of 
correct tuples produced by the system divided by the 
total number of tuples produced by the system. 

The baseline gives the results obtained when ran- 
domly assigning the synonyms of a verb to its defini- 
tions. 



4 Experiments 

4.1 Comparing similarity measures 

To assess the impact of the similarity method 
used, we applied the 6 similarity measures listed 
in Table [T] namely, simple word overlap, ex- 
tended word overlap, extended word overlap nor- 
malised, 1st order vectors and 2nd order vectors 
with and without a tfidf threshold. These methods 
were implemented using Ted Pedcrsen's Perl library 
search . cpan . org/dist/WordNet-Similarity/ and 
adapting it to fit our dat£0. 

Simple word overlap. Simple word overlap be- 
tween glosses were introduced by |18| to perform word 
sense disambiguation. The Lesk Algorithm which is 
used there, assigns a sense to a target word in a given 
context by comparing the glosses of its various senses 
with those of the other words in the context. That 
sense of the target word whose gloss has the most 
words in common with the glosses of the neighbouring 
words is chosen as its most appropriate sense. 

Similarly, here we use word overlap to assess the 
similarity between a verb definition and the merged 
definitions of a synonym. Given a set of verb defini- 
tions and a synonym, the synonym will be matched 
to the definition(s) with which its definitions has the 
most words in common (and at least one). 

Extended word overlap. The scoring mechanism 
of the original Lesk Algorithm does not differentiate 
between single word and phrasal overlaps. [3] modifies 
the Lesk method of comparison in two ways. First, 
the glosses used for comparison are extended by those 
of related WordNet concepts and second, the scor- 
ing mechanism is modified to favour glosses containing 
phrasal overlaps. An n word overlap is assigned an n 2 
score. Because the French EuroWordNct is relatively 
under-developecQ we did not modify the comparison to 
take into account WordNet related glosseiQ. We did 
however modify it to take into account phrase over- 
laps using the same scoring mechanism as Banerjee 
and Pedersen in [3pl . 

Extended word overlap normalised. The ex- 
tended word overlap is normalised by the number of 
words occurring in the definitions being compared. 

First order vectors. A first order word vector for a 
given word indicates all the first order co-occurrences 

2 In particular, calls to the Princeton WordNet were removed. 

3 The French EuroWordNct (Frewn) contains 3 777 verbs. 
Since [23] alone lists 5 027 verbs, it is clear that a Frewn 
based extended gloss overlap measure would only partially 
be applicable. 

4 As mentioned in the introduction ^ a n alternative WordNet 
for French is being developed by 1241 . It cannot be used to 
integrate in the comparison glosses of WordNet related words 
however because the glosses associated with synsets are the 
Princeton WordNet English glosses. 

5 Recall (cf. Section [2j that the index of a definition is the 
list of lemmas for the open class words occurring in that def- 
inition. The order in the list reflects the linear order of the 
corresponding words in the definition. 



of that word found in a given context (e.g., a TLFi def- 
inition). Similarity between words can then be com- 
puted using some vector similarity measure. For each 
verb V, we build weighted word vectors for each of 
its definitions dy and for each of its synonyms. The 
dimensions of these vectors are the lemmatised words 
occurring in the definitions of V whose tf.idf is dif- 
ferent from 0. The similarity score between a verb 
definition d l v and a synonym Syny is the product of 
the two corresponding vectors. 



Second order word vectors with and without 
tf.idf cutoff. Second order vectors are derived from 
first order vectors as follows. For each verb/synonym 
definition, the corresponding second order vector is the 
sum of the first order vector^] defined over the words 
occurring in this definition. The second order vectors 
"average" the direction of a set of vectors. If many 
of the words occurring in the definition have a strong 
component in one of the dimensions, then this dimen- 
sion will be strong in the second order vector. In other 
words, the second order vector helps pinning down the 
strength of the different dimensions in a given defini- 
tion. 

The similarity score between a verb definition and 
a synonym is the product of the two corresponding 
second order vectors. We compare two versions of the 
second order word vectors approach, one where a tf.idf 
cut-off is used to trim down the word space and an- 
other where it isn't. 

The results obtained by the various measures are 
given in Table [T] (left side) . 

A first observation is that our synonym-to-definition 
mapping procedure systematically outperforms the 
random assignment baseline. Thus, despite the brevity 
of dictionary definitions, gloss based similarity mea- 
sures appear to be reasonably effective in associating 
a synonym with a definition on the basis of its own 
definitions. 

A second observation is that no similarity measure 
clearly yields better results than the others. This sug- 
gests that word overlap between TLFi definitions is a 
richer source of information for synonym sense disam- 
biguation (SSD) than other more indirect contextual 
cues such as the distributional similarity of the words 
occurring in the definitions (first order word vector ap- 
proach) or of the words defined by the words occurring 
in the definitions (second order word vector approach). 



6 The weights are computed as follows: For a definition dy, 
the weight of each word Wj is the number of occurrences of 
Wj in dy divided by the number of occurrences of Wj in all 
definitions of V. For a synonym Syny, the weight of Wj is 
the number of occurrences of Wj in the definitions of Syny 
divided by the number of occurrences of wj in all definitions 
of V. 

7 In contrast to the vectors used in the first order approach, 
the dimensions of the first-order vectors used to compute the 
2nd order vectors are the lemmatised open class words of all 
definitions in the TLFi (not just the words occurring in the 
definitions of a given verb). 

9 Please note that the values shown here have been computed 
with higher precision and then rounded, therefore some dif- 
ferences in scores may no longer be visible. 
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Table 1: Precision, recall and F-measure for 
various similarity measures, with (right side) 
and without (left side) reflexive /non reflexive 
distinction. The similarity measures are the follow- 
ing: Over 1: Simple word overlap, Over 2: Extended 
word overlap, Over 3: Extended word overlap nor- 
malised, WV 1: First order vectors, WV 2: Second 
order vectors, without tf.idf cut-off, WV 3: Second or- 
der vectors, with tf.idf cut-off. Best scores are set in 
bold fac^. 



Abandonner: (1) se dessaisir, renoncer a, se 
deposseder, se depouiller, abdiquer, se demettre, 
demissionner, se desister, resigner, renoncer a, sac- 
rifier, ceder, Conner, donner, leguer (2) conceder, 
accorder (3) exposer (ancient), delaisser, lacher, 
tomber, larguer (fam.), plaquer (fam.) . . . 
S 'abandonner: se livrer, succomber, ceder, se don- 
ner, s'epancher, se fier, se reposer sur 

Fig. 3: Sample (reflexive and non-reflexive) synonym 
dictionary entry of (s') abandonner, (to abandon,). 



4.2 Linguistic preprocessing 

A single TLFi verb entry might encompass several 
very different uses/meanings of this verb. Typically, 
it might include definitions that relate to the reflex- 
ive use of that verb, to a non reflexive use and/or to 
collocational use. 

The approach presented in the previous section does 
not take such distinctions into account and is therefore 
prone to compare apples and oranges. It will for in- 
stance select the synonyms of a verb V and match 
these into all its definitions independent of whether 
these definitions reflect a reflexive or a, non reflexive 
usage. This is clearly incorrect because the synonyms 
of a verb V are not necessarily synonyms of its reflex- 
ive form. For example, the synonyms of the non re- 
flexive form abandonner (to abandon) listed in Fig. [3] 
are clearly distinct from those of the reflexive form 
s 'abandonner (to give way). 

Hence matching e.g., the synonyms of abandonner 
onto definitions corresponding to a reflexive use of the 
verb will result in incorrect synonym/definition asso- 
ciations. 

To account for these observations, we developed an 
approach that aims to take into account the reflex- 
ive/non reflexive distinction. The approach differs 
from the procedure described in the previous section 
as follows: First, we automatically differentiated both 
in the handbuilt reference and in the automatically 
extracted verb entries between the reflexive and the 



non reflexive usage of a verb. For each verb with the 
two types of usage, we constructed two entries each 
with the appropriate definitions. The synonym selec- 
tion is then done with respect to a verb entry i.e., with 
respect to either a reflexive or a non reflexive usage. 

As a result, similarity measures were applied be- 
tween the definitions of verbs corresponding to the 
same type of usage. In other words, the definitions 
of a synonym associated with a given verb usage (re- 
flexive vs. non-reflexive) were compared only with the 
definitions of this particular usage. 

The results obtained on the basis of this modified 
procedure are given in Table [I] right side. 

Unsurprisingly while precision increases, recall de- 
creases. The increase in precision indicates that this 
linguistically more constrained approach does indeed 
support a better matching between synonyms and def- 
initions. The decrease in recall can be explained by 
several factors. First, the information contained in the 
TLFi concerning reflexive and non reflexive usage is 
irregular so that it is sometimes difficult to automat- 
ically distinguish between the definition of a reflexive 
usage and that of a non reflexive usage. Second, the 
synonym dictionary might fail to provide synonyms for 
a reflexive usage listed by the TLFi. Third, a reflex- 
ive verb listed in the synonym dictionary might fail to 
have a corresponding entry (and hence definition) in 
the TLFi. All of these cases introduce discrepancies 
between the reference and the system results thereby 
negatively impacting recall. 

In short, while a finer linguistic processing of the 
data contained in the TLFi might help improve preci- 
sion, a better recall would involve enriching both the 
synonym and the TLFi dictionaries. 

5 Related work 

Our work has connections to several research areas 
namely, word sense disambiguation (we aim to identify 
the meaning of a synonym and more specifically, to 
map a synonym to one or more dictionary definitions 
associated by a dictionary with the verb of which it 
is a synonym), synonym lexicon acquisition (we plan 
to use the method presented here to merge the five 
synonym lexicons into one) and WordNct construction 
(by identifying sense based synonym sets i.e., synsets). 

Word sense disambiguation (WSD) uses four 
main types of approaches namely, lexical knowledge- 
based methods which rely primarily on dictionaries, 
thesauri, and lexical knowledge bases [UlEI], with- 
out using any corpus evidence; supervised and semi- 
supervised approaches [20] which make use of sense 
annotated data to train or start from and unsuper- 
vised methods [22] . 

The approach presented here squarely fits within 
the lexical knowledge-based methods in that it ex- 
clusively uses dictionary definitions to disambiguate 
words. Supervised and semi-supervised approaches 
were not considered because of the absence of sense 
annotated data for French. Moreover, as shown by 
the construction of the reference sample and the agree- 
ment rate obtained (cf. Section [2J , the fact that we 
are working on disambiguating synonyms (as opposed 



to a set of arbitrary words) out of context makes sense 
annotation a lot more difficult than for the standard 
WSD task. 

It would in principle be possible to use an unsu- 
pervised approach and attempt to disambiguate syn- 
onyms on the basis of raw corpora. Such approaches 
however are not based on a fixed list of senses where 
the senses for a target word are a closed list coming 
from a dictionary. Instead they induce word senses di- 
rectly from the corpus by using clustering techniques, 
which group together similar examples. To associate 
synonyms with definitions, it would therefore be neces- 
sary to define an additional mapping between corpus 
induced word senses and dictionary definitions. As 
noted in pQ, such a mapping usually introduces noise 
and information loss however. 

Synonym lexicon construction. As noted above 
and further discussed in Section [6] the method de- 
scribed in this paper can be used to merge the five 
synonym dictionaries mentioned in section[2Jinto a sin- 
gle one. In this sense, it is related to work on synonym 
lexicon construction. Much work has recently focused 
on extracting synonyms from dictionaries and/or from 
corpora to build synonym lexicons or thesauri. Thus, 
[T51 [HI 03] extract synonyms from large monolingual 
corpora based on the idea that similar words occur in 
similar context; [1] used a bilingual corpus; [B] use the 
structure of monolingual dictionaries; and [25j com- 
bine both monolingual and bilingual resources. Such 
approaches are fundamentally different from the work 
presented here in two main ways. First, they aim to 
extract synonyms from linguistic data and thereby of- 
ten yield "associative" lexicons rather than synonymic 
ones. In other words, these approaches yield lexicons 
which often associate with a word, synonyms but also 
antonyms, hypernyms or simply words that belong to 
the same semantic field. In contrast, we work on a 
predefined base of synonyms and the lexicon we pro- 
duce is therefore a purely synonymic lexicon. Second, 
whereas we associate synonyms with a predefined list 
of senses, existing work on synonym lexicon construc- 
tion usually doesn't and is restricted to identifying sets 
of synonyms (or semantically related words). 

WordNet and thesaurus construction. Group- 
ing synonyms in sets reflecting their possible senses 
effectively boils down to identifying synsets i.e., sets 
of words having a common meaning. In this sense, 
our work has some connections with work on Word- 
Net development and more precisely, with a merge ap- 
proach to WordNet development that is, with an ap- 
proach that aims to first create a WordNet for a given 
language and then map it to existing WordNets. Re- 
cently, |24l I13j have presented an extend approach to 
WordNet construction for French based on a parallel 
corpus for 5 languages (French, English, Romanian, 
Czech, Bulgarian). Briefly the approach consists in 
first extracting a multilingual lexicon from the aligned 
parallel corpora and second, in using the Balkanct 
WordNets to disambiguate polysemous words. The 
approach relies on the fact that the WordNets for En- 
glish, Romanian, Czech and Bulgarian all use the same 
synsct identifiers. First, the synset identifiers of the 



translations of the French words are gathered. Second, 
the synset identifier shared by all translations is as- 
signed the French word. In this way, and using various 
other techniques and resources to assign a synset iden- 
tifier to monosemous words, [2HE3] produces a Word- 
Net for French called WOLF (freely available Word- 
Net for French) that replicates the Princeton WordNet 
structure. 

Like work on synonym extraction, the WOLF ap- 
proach differs from ours in that synonyms are auto- 
matically extracted from linguistic data (i.e., a parallel 
corpus and the Balkanet WordNets) rather than taken 
from a set of existing synonym dictionaries thereby in- 
troducing errors in the synsets. |24[I13] report a preci- 
sion of 63.2% for verbs with respect to the French Eu- 
ro WordNet. A second difference is that our approach 
associates synsets with a French definition (from the 
TLFi) rather than an English one (from the Princeton 
WordNet via the synset identifier). A third difference 
is that we do not map definitions to a Princeton Word- 
Net synset identifier and therefore cannot reconstruct 
a network of lexical relations between synsets. More 
generally, the two approaches are complementary in 
that ours provides the seeds for a merge construction 
of a French WordNet whilst [211 Q3] pursue an extend 
approach. 



6 Conclusion and future work 

We have presented an automatic method for assigning 
synonyms to definitions with a reasonably high F-score 
of at best, 0.70 (P=0.67,R=0.71). Future work will 
focus on two main points. 

First, we will explore ways of improving these re- 
sults. In particular, we will investigate in how much 
the structure of a dictionary entry can be used to en- 
rich a definition. As mentioned in Section [21 a dictio- 
nary entry has a hierarchical structure which is often 
used by the lexicographer to omit information in def- 
initions occurring lower down in the hierarchy. Au- 
tomatically enriching the TLFi definitions by inherit- 
ing information from higher up in the dictionary entry 
might result in definitions which, because they contain 
more information, provide a better basis for similarity 
measures. Similarly to the distinguishing treatment 
of rcflcxive/non reflexive usages discussed in section 
14.21 we will also develop a separate treatment of def- 
initions involving verbal collocations (as opposed to 
isolated verbs). 

Second, we will use this method to merge the syn- 
onym dictionaries into one where each word is asso- 
ciated with a set of (TLFi) definitions and each def- 
inition with a set of synonyms. We will then inves- 
tigate, on the basis of the resulting merged synonym 
dictionary, how to reconstruct the lexical relation links 
used in WordNet. To this end, we intend to explore 
in how far translation and ontology enrichment tech- 
niques [10] can be applied to enrich our synonym lexi- 
con and align it with the Princeton WordNet. In this 
way, we can build on the WordNet structure given by 
the Princeton WordNet and enrich the synsets derived 
from the five synonym dictionaries with translations 
of the related English synonyms. 
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